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' fo investigate the effect of violating the assumption 
of equal ites difficulty on Kuder-Richardson (KR) Formula 21 

.. réliability coefficient, 670 eighth-and ninth- grade students were 
adpinistered 26 short, homogeneous "tests" of mathematics concepts 
and skills. Both KR Porgjsula 20 and KR Foraula 21 were used to 
estimate reliability on each test. The 26 tests were sorted into a 
high itea difficulty variability group and a low item.difficulty 
variability group, andthe magnitude of differences in KR20 and KR21 
reliability coefficients were ccupared for the two groups. The 
difference in KR20 and KR21 reliability coefficients was 

' significantly greater when the range of item difficulty values was 
-30 or more. Nevertheless, KR21 was a good’ estimate of KR20 when the’ 
range of iter difficulty was relatively narrow. Implications forstest, 
selection are suggested. When KR21 has been used to estimate a test's 
reliability, the user should note that the test has a lower bound of 
internal consistency reliability, particularly when the iter 
difficulty range is great. (Author/GDC) , 
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Differences Between Kider-Richardson Formula 20 and Formula 21 ; 
Reliabiltty Coefficients for Short Tests with Different Item Variabilities® 
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oe «2 Of the various statistical methods for estimating the internal con: 
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Pa : ‘ aiatency ay of a tnaty the reliability estimates developed by maar 
; asl Richardson (1937) have been widely used by test makers. The use of 

the Kuder-Richardson rehiability estimates requires only the administra-~ 
‘ | tion of a single test and ‘does away with any biases that might arise when 
a test is apiie any one of a number of ways, as in the split-half method. 
| e | The two prtiniicy sources of etror wéviancs considered in the Kuder-Richardson 
ee are content sampling and heterogeneity of the measured trait, and . i 
their assumptions call for test items of eicti, or nearly equal, difficulty | 


| and, intercorrelation. 


The most accurate Kuder-Richardson formala, known as K-R 20, can be 


Jf | . . 
/ ‘ he expressed as follows: 


C ~~ 
_ = (a Sgtee), : 
where n® the number of items in the test; 


the proportion of correct responses to.each item; 


4 
a=] 
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| the proportion of incorrect responses to each item 
? : x 


2. 
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i (1 =p); and 
ae ie wT, ‘= the variance of the distribution of the test scores. 
An approximation to K-R 20, which assumes that all items in the test have 


| @pptoximately the same difficilty, calls for less information and ia 
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= . ruse at the Annual Meeting of the American Educational ‘ | 
_ Research Agsociation, April 4-7, 1977, New York. — ' 
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fore much easier to eaiedlcee by hand, 


The simpler formula, known 
as 4 ah can ‘be expressed as follows: 


. Ttt = 
where | 


c= (Re) SPE), 


Pp = the “average proportion of correct responses” to 
each item; and ~ 


q = the average is la ai of incorrect responses * 
, to each item (1-8). 


Aithough the reliability estimate 7» (% from K-R 21 is generally lower 
x 
@than that obtained from K-R 20 by 37, (se for a given set of items 


administered to a: given group’ of examinees, test makers still report 


K-R 21 reliability coefficients since deviations from the assumption of 
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equal item difficulty result in a reduction of the coefficient, and hence 
a lower bound 


s 


Of course, both of the Kuder-Richardson formulas presented here are 
‘ 


based on the assumptions that ‘a uni-factor trait is being measured and 
that the test is made up of parallel items, 


Studies carried out to 
determine the robustness of K-R 20 under violation of the uni-factor 


re 
assumption indicated that K-R 20 showed little bias when the tests were 


relatively long (more than 18 items) and when the item intercorregations 
were low (less than .6) (Brogden, 1946) 
Ni 


Although Kuder and Richardson 
(1937) assumed in the derivation of their formulas that inter-item cor- 


relations were equal, Jackson and Ferguson (1941) claimed that. the only 


necessary assumption was that the average covariance between parallel 


items be equal to the average covariance between nonparallel items. 


For 


\ 


\ 
internal. consistency reliability, if not a lower bound 


the most part, K-R 20 is considered to be a satisfactory estimate of 


OS 


Given the indications that K-R, 20 is quite robust under violations 


of its assumptions, the pyrpose of the paneielr study was to | investigate 


the robustness of KR 21 as an seltaxts ‘of K-R 20 under violations of 


che aoaletonsl assumption of ‘equal item diseteulty. The results of the 


etudy may also help to shed some light on test selection procedures when 


suche procedures include the examination of reliability, coefficients. 


The: data-for the present study were obtained as part of the Equat ing 


‘of Forms Phase of the National Standardization Research Program for the 


1976 edition of Stanford Diagnostic Mathematica Test (SDM) conducted in 
the fall of 1975. Five school systems participated in this research phase 
and administered both 6f two parallel forms, A and B, of SDMI to the same 
students within a three-week period. | Since the order of administration of 
thé two forms was counterbalanced by classroom to chatane: prmcties effect, 
the administration of the two parallel forms to the same students can be” 
thought of as one long test. The data presented in this paper is limited . 
to the approximately 625 ‘eighth-grade and high a ak students completa 
the Blue Level of this "long" test. 

Stanford Diagnostic Mathematics Test is designed to measure competence 


in the basic skills and concepts that: are important in daily ‘affairs and 


‘prerequisite to the continued study of mathematics and, as such, can be 


uged as ‘an effective instructional tool. Therefore, in addition to the 


reporting of scores on each of its three subtests, Number System and 


- Numeration, Computation, and Applications, scores are also reported on 


mutually exclusive groups of ‘items within each subtest, These groups of 
items are referred to as Concept/Skill Domains. There are 13 such domains 


on each form of the test, or 26 in all. Since scores are routinely reported 
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a "o 
j . ,for ‘each of. these ‘domains. ; educational decisions made on the basis of 


\ mo hese scores, a was imperat ve that reliability estimates be obtained fox 
bea ed ‘ 


_ "these “abort tests. : Alternate-forms reliability for the domains has ben 


dtacusset elsewhere (Oswald, oo Lenke, 1977). Internal consistency 


oe Liability coefficients estimated by means a | Kuder-Richardson Formulas 


; + 
fr each domain, .It is istereing to note in He table that s 
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“ith as fon as six items: can have: reliabilities in the 70's. - 
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jal" itep difficulties.| It was, decided tnt ‘dowains having a’t. 


\ 
mye of .30.or more be ‘des ignated| as. having "unequal" it 


difficulties; those having a ange of less than .30, as having "equal! 


' difficulties. Therefore, on: both Roras Avand B, domaing 1.1, 1.2, 3.1 


and 3,3° were designated "unequal, "as were domains- 2.8 on Form A and 


Form B; the neenintng donate were designated equal. st “ 


All K-R 20 and 21 reliability coefficients were ‘transformed to Fisher's 


2 coefficients and differences between these B coefficients determine 


; for the "equal" and "unequal" sets of domains. A t-test of the differenc 
: ‘ oe 


‘ 


” between the means of these K-R 20 - K-R 21 differences for the two’ groups of | 


domains was carried out, resulting in ‘ t of 3.63 with cad degrees of free 
13 


This t value is. significant . 01 Leu, 
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The results pf this study indicate, therefore, that the difference 
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| difficulty. values is .30 or-more than when the range is iese than .30. : i ae ef 


| 
I, 
i abel 20 and K-R 21 is ‘stent fcantly aaah when the range of item - 


ae Y Poe Neyertheless, it — appear that, even for short tests, “KER 21 is ‘a aod 
estimate of K-R 20 if be range of item difficulties is relatively narrow. 
(A quick eae st: Table 1 tiay not bring this out quite so-clearly since 


some an the K-R 20, and K-R 21 coefficients "look" sufficiently close, despite 


aati a broad range of item difficulty value, “Since correlation coefficients 


« 


_do not ‘represent an interval scale, absolute differences between them are not 


| comparable throughout the -1 to +1 range. The closer the correlations are 


to -l or +l, the greater the difference between them really is. It is for 


/ 


| this reason that differences between correlations must be examined using 


: oS Fisher! srto# trans format n.) 
\ 7 _ As for the eee {Ymplications of this seit: test users should take 
note of the method used to estimate a test's. RALLSDLLEETs particularly if 
high" test reliability is 4 major criterion for test selection. If K-R 
| 21 has-been used to estimate a test's réliability, the fact that it is a 


“ e } ; ’ 


vet} 
.| lower bound of internal consistency reliability must be considered, par- 


\ ticularly when the range of item difficulty is great. Table 2 demonstrates ; 


that different test selection decisions can be made purely on the basis of 


he particular‘ formula used to estimate test reliability. 
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_ Table 1. . kuder-Richardson Formula @0 and 21 Reliability Coefficients for Forms A and B 


of Stanford Diegnostic Hathenat ics Test Concept /Skill Domains for the Blue 
‘ Level, Equating of Forms Sample = 626). . 
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: , : : . fs . , 
. Number |# K-R 20 K-R 2t Range 
Concept/Skil1/Domain .of Reliability | Reliability of Item 
Items Coefficient | Coefficient D 2 
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Computation 


Applications 


3.1 
3.2 


* Table 2. Contingency Tables Showing the Number of Stanford Diagnostic 


/ Mathematics Test Blue Level Concept/Skill Domains Accepted 
+ . and Rejected on the Basis of Vatious Reliability Selection 
Criteria. : 
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